#### 1

# ECE385 Experiment #9

Eric Meyers, Ryan Helsdingen

Section ABG; TAs: Ben Delay, Shuo Liu

April 13th, 2016

emeyer7, helsdin2

# I. Introduction

The purpose of this lab was to explore encryption/decryption techniques using the Advanced Encryption Standard (AES). The team wrote both encryption and decryption algorithms that runs on software and hardware respectively. The advantages/disadvantage of these two techniques will be analyzed in the post-lab section.

#### II. DESCRIPTION OF CIRCUIT

The encryption algorithm will be performed on a NIOS-II processor and programmed in C, whereas the decryption algorithm will be performed on a Cyclone IV FPGA and programmed in System Verilog.

The C program onboard the NIOS-II processor will communicate with the hardware on the Altera DE2-115 board and transfer both the encrypted message and key to the hardware. The hardware will then proceed to decrypt this message using the provided key and display the result on the hex display.

# III. PURPOSE OF MODULES

The AES encryption/decryption algorithm used several modules. However, these can be broken down into software modules and hardware modules. The only software module was the NIOS system used to create the software encryption algorithm in C. The following were the hardware modules and these will be explained in detail when necessary:

- lab9 (top-level)
- aes\_controller
- AES
- io module
- KeyExpansion
- AddRoundKey
- SubBytes, InvSubBytes

- InvSubBytes\_16
- ShiftRows, InvShiftRows
- MixColumns, InvMixColumns
- HexDriver

First, the encryption algorithm, as stated before was programmed in C on a NIOS-II processor. This processor is contained within the "NIOS system".

The NIOS system developed was similar to that of the one created in lab8 except there are 4 PIO modules to allow communication to/from the hardware (to\_sw\_sig, to\_hw\_sig, to\_sw\_port, to\_hw\_port). A JTAG UART peripheral was used to allow debugging upon the host computer and to allow user input to the C program.

lab9

This is the top-level module containing the connections from the nios\_system, aes\_controller, io\_module, and the hex display.

aes\_controller

The aes\_controller is the module that controls the operation of the AES module. It determines if the AES decryption algorithm is in a IDLE state, COMPUTE state, or READY state. This is accomplished through a simple state machine that was provided. The only changes the team made is instead of relying on a counter to determine when the algorithm was done, this was instead done inside the algorithm itself and output through the module into the controller.

AES

The AES module is the main decryption algorithm implementation. It contains inputs of Reset and Run, along with 128-bits of encrypted text and 128-bits of a key. It outputs a 128-bit value containing the plaintext of the encrypted input after finishing. The main algorithm

```
module AES (input [127:0] Plaintext,
Cipherkey,
input clk,
Reset,
Run,
output logic [127:0] Ciphertext,
output Ready );
```

is explained in the lab manual, as well as below in the State Diagram section.

The module essentially loops through 10 rounds of calculations with each round containing the correct cycle of calculations as defined by the state machine.

### io module

```
module io module (
                                              clk,
                    input
                     input
                                               reset_n,
                     output logic [1:0]
                                               to_sw_sig,
                     output logic [7:0]
                                               to_sw_port,
                     input [1:0]
                                               to_hw_sig,
                     input [7:0]
                                               to hw port,
                     output logic [127:0]
                                               msg_en,
                                              key,
                     output logic [127:0]
                                              msg_de,
                     input [127:0]
                     output logic
                                               io_ready,
                                              aes_ready,
                     input
                     output logic[6:0]
                                              state_leds
```

The io\_module provides the communication between the hardware and software. The module follows a 100-state machine diagram shown in Figure 1. Inputs include the decrypted message, hardware port and sig signals to control the movement of the message, clock, reset, and AES ready bit. Outputs include 128-bit encrypted message and key, software port and sig signals, IO ready bit, and output signals to show the state number on the LEDs.

#### **KeyExpansion**

```
module KeyExpansion (input clk, input [0:127] Cipherkey, output [0:1407] KeySchedule );
```

The KeyExpansion module takes in the Cipher Key to create the first round key and then loops 10 more times creating 128-bit Round Keys each based off the previous key. The 11 Round Keys are then stored into a 1408-bit Key Schedule.

#### **AddRoundKey**

The AddRoundKey module fetches a 4-word, 128-bit Round key from the pre-computed Key Schedule and bitwise XORs with the corresponding byte from the updating 128-bit state. ?? shows a diagram of the AddRoundKey module.



# SubBytes, InvSubBytes

```
module SubBytes ( input clk, input [0:7] in , output logic [0:7] out );
```

SubBytes takes each byte of the updating state and transforms it by taking the multiplicative inverse of the Rijindael's finite field. The transformation has been simplified into a black-box designated as the S-box (substitution box). Inputs include the 256 byte array and clock. Outputs include the transformed 256 byte array. Figure ?? portrays this module as a 4x4 matrix for sample.



The InvSubBytes module is similar to the SubBytes module except that the S-box is replaced by its inverse to reverse the encryption process.

#### InvSubBytes\_16

```
module InvSubBytes_16(input logic clk,
input logic [0:127] in,
output logic [0:127] out);
```

InvSubBytes\_16 is a submodule of InvSubBytes. It takes in the entire state during the InvSubBytes round

and performs the appropriate substitution for each row and column.

# ShiftRows, InvShiftRows

```
module InvShiftRows(input logic [0:127] in, output logic[0:127] out);
```

The ShiftRows module takes in the 128-bit state and shifts the elements in each row n of the matrix by n-1 to the left. Figure ?? shows a diagram of ShiftRows.

| State      |                  |                  |                  |                  |           | Result           |                  |                  |                  |
|------------|------------------|------------------|------------------|------------------|-----------|------------------|------------------|------------------|------------------|
| no shift   | a <sub>0,0</sub> | a <sub>0,1</sub> | a <sub>0,2</sub> | a <sub>0,3</sub> |           | a <sub>0,0</sub> | a <sub>0,1</sub> | a <sub>0,2</sub> | a <sub>0,3</sub> |
| 1 shift 🧲  | a <sub>1,0</sub> | $a_{\rm l,i}$    | a <sub>1,2</sub> | a <sub>1,3</sub> | ShiftRows | $a_{\rm i,i}$    | a <sub>1,2</sub> | a <sub>1,3</sub> | a <sub>1,0</sub> |
| 2 shifts 🧲 | a <sub>2,0</sub> | a <sub>2,1</sub> | $a_{2,2}$        | a <sub>2,3</sub> |           | a <sub>2,2</sub> | a <sub>2,3</sub> | a <sub>2,0</sub> | a <sub>2,1</sub> |
| 3 shifts 🧲 | a <sub>3,0</sub> | $a_{i,i}$        | a <sub>3,2</sub> | a <sub>3,3</sub> |           | a <sub>3,3</sub> | a <sub>3,0</sub> | a <sub>3,1</sub> | a <sub>3,2</sub> |

The InvShiftRows module does exactly whatt he ShiftRows module does except it reverses the direction such the row n is right-circularly shifted by n-1 times. MixColumns.InvMixColumns

```
module InvMixColumns ( input [0:31] in , output logic [0:31] out );
```

The MixColumns module takes each of the four words from the state individually and goes through invertible linear tranformations over  $GF(2^8)$  including multiplication by a fixed polynomial matrix to linearly combine the four bytes of each word to form a new Word.

# **HexDriver**

This module contains the logic necessary to output values in binary/decimal in hex on the hex displays on board the Altera De2-115.

# IV. SOFTWARE/HARDWARE STATE DIAGRAM

The software/hardware state diagram is handled in io\_module and is shown in Figure 1.

#### V. DECRYPTION STATE DIAGRAM

The decryption state diagram is split up into two FSMs for simplicity. One is referred to as the Calculation FSM and the other is referred to as the Round FSM. There are a total of 10 rounds with an extra round to ensure the key\_expansion algorithm performs. Each round a series of calculations are performed as determined in the algorithm outlined in the description PDF of this lab. The Round FSM is shown in Figure 2 and the Calculation FSM is shown in Figure 4.

#### VI. SCHEMATIC/BLOCK DIAGRAM

#### VII. ANNOTATED PRE-LAB WAVEFORMS

The annotated pre-lab waveforms can be found in Figure 5, 6, 7, and 8 for a 2000ns period. The ending decrypted value given plaintext of "daec3055df058e1c39e814ea76f6747e" with a key of "0102030405060708090a0b0c0d0e0f" is "ece298ece298...dc".

VIII. POST LAB

| Resource      | Value |  |  |
|---------------|-------|--|--|
| LUT           |       |  |  |
| DSP           |       |  |  |
| Memory (BRAM) |       |  |  |
| Flip-Flop     |       |  |  |
| Frequency     | MHz   |  |  |
| Static Power  | mW    |  |  |
| Dynamic Power | mW    |  |  |
| Total Power   | mW    |  |  |
|               |       |  |  |

TABLE I: Design Statistics

- 1) Which would you expect to be faster to complete encryption/decryption, the software or hardware? Is this what your results show?
  - The faster encryption/decryption is expected to be the hardware because programming in software (specifically in C) is meant for generalized processors and thus must go through more clock cycles to do more compared to that of the hardware. The hardware is dedicated to the task at hand, and thus is faster than the software.

Software Speed: MB/s

Hardware Speed: MB/s (simulation)

- 2) If you wanted to speed up the hardware, what would you do? (Note: restrictions of this lab do not apply to answer this question)
  - Instead of containing separate modules for each of the operations, these can be combined into a single module and the entire computation for one round can be done in a single state. This would significantly decrease the amount of time taken for one state to process.

# IX. CONCLUSION

Overall, this lab proved to be very difficult from both a hardware and software perspective. Programming the encryption algorithm in C was difficult for many reasons. Working with two-dimensional arrays is never an easy task in C, and this lab seemed to put this skill to the test. The notation used throughout the algorithm is difficult to understand fully what should be passed into certain functions. However, once a basic working model of the encryption algorithm was developed, it was optimized for patterns, and condensed and therefore much easier to understand.

The hardware decryption was just as difficult as the software. The finite state machines that were developed took a large amount of time to debug using ModelSim. The io\_module used to pass information to/from the software proved to be a large time-sing in and of itself.

In the end, the team did not receive full points for demo due to a glitch in passing the data to/from software. However, partial credit was received for showing simulations...

Overall this lab was one of the most difficult ECE385 labs, however it was one of the most rewarding to understand in the end. AES encryption/decryption is now very clear to the team.

# X. FIGURES



Fig. 1: IO Module Hardware/Software State Diagram



Fig. 2: Decryption Round State Diagram



Fig. 3: Calculation State Diagram

Fig. 4: Calculation State Diagram



Fig. 5: Simulation 0 - 500ns



Fig. 6: Simulation 500 - 1000ns



Fig. 7: Simulation 1000 - 1500ns



Fig. 8: Simulation 1500-2000ns

APPENDIX